A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs
نویسندگان
چکیده
منابع مشابه
Accelerating WiMAX System Design with FPGAs
WiMAX, or the IEEE 802.16 standard for broadband wireless access, is increasingly gaining in popularity as a technology with significant market potential. This paper first provides an overview of the existing and developing 802.16 standards and their key differences/applications. The PHY and MAC layers of a typical WiMAX base station are then described. The associated implementation challenges ...
متن کامل3D Recursive Gaussian IIR on GPUs and FPGAs A Case Study for Accelerating Bandwidth-Bounded Applications
GPU devices typically have a higher off-chip bandwidth than FPGA-based systems. Thus typically GPU should perform better for bandwidth-bounded massive parallel applications. In this paper we present our implementations of a 3D recursive Gaussian IIR on multicore CPU, many-core GPU and multi-FPGA platforms. Our baseline implementation on the CPU features the smallest arithmetic computation (2 MA...
متن کاملCan Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels of 3D CNNs in the field of action recognition have improved significantly. However, to date, conventional research has only explored relatively shallow3Darch...
متن کاملTowards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs
Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we propose a work-ste...
متن کاملAccelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design
Basic Linear Algebra Subprograms (BLAS) play key role in high performance and scientific computing applications. Experimentally, yesteryear multicore and General Purpose Graphics Processing Units (GPGPUs) are capable of achieving up to 15 to 57% of the peak performance at 65W to 240W of power respectively in underlying platform for compute bound operations like Double/Single Precision General M...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2019
ISSN: 2079-9292
DOI: 10.3390/electronics8010065